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Abstract 

In this paper, we consider the problem of compressed sensing where the goal is to recover almost all the sparse 
vectors using a small number of fixed linear measurements. For this problem, we propose a novel partial hard- 
thresholding operator that leads to a general family of iterative algorithms. While one extreme of the family yields 
well known hard thresholding algorithms like ITI and HTP[17, 10], the other end of the spectrum leads to a novel 
algorithm that we call Orthogonal Matching Pursuit with Replacement (OMPR). OMPR, like the classic greedy 
algorithm OMP, adds exactly one coordinate to the support at each iteration, based on the correlation with the current 
residual. However, unlike OMP, OMPR also removes one coordinate from the support. This simple change allows us 
to prove that OMPR has the best known guarantees for sparse recovery in terms of the Restricted Isometry Property 
(a condition on the measurement matrix). In contrast, OMP is known to have very weak performance guarantees 
under RIP. Given its simple structure, we are able to extend OMPR using locality sensitive hashing to get OMPR- 
Hash, the first provably sub-linear (in dimensionality) algorithm for sparse recovery. Our proof techniques are novel 
and flexible enough to also permit the tightest known analysis of popular iterative algorithms such as CoSaMP and 
Subspace Pursuit. We provide experimental results on large problems providing recovery for vectors of size up to 
million dimensions. We demonstrate that for large-scale problems our proposed methods are more robust and faster 
than existing methods. 

1 Introduction 

We nowadays routinely face high-dimensional datasets in diverse application areas such as biology, astronomy, finance 
and the web. The associated curse of dimensionality is often alleviated by prior knowledge that the object being 
estimated has some structure. One of the most natural and well-studied structural assumption for vectors is sparsity. 
Accordingly, a huge amount of recent work in machine learning, statistics and signal processing has been devoted 
to finding better ways to leverage sparse structures. Compressed sensing, a new and active branch of modern signal 
processing, deals with the problem of designing measurement matrices and recovery algorithms, such that almost all 
sparse signals can be recovered from a small number of measurements. It has important applications in imaging, 
computer vision and machine learning (see, for example, [9, 24, 14]). 

In this paper, we focus on the compressed sensing setting [3, 7] where we want to design a measurement matrix 
A S R™x" such that a sparse vector x* G M" with ||a;*||o := | supp(a;*)| < k < n can be efficiently recovered from 
the measurements b = Ax* E W\ Initial work focused on various random ensembles of matrices A such that, if A 
was chosen randomly from that ensemble, one would be able to recover all or almost all sparse vectors x* from Ax* . 
[3] isolated a key property called the restricted Isometry property (RIP) and proved that, as long as the measurement 
matrix A satisfies RIP, the true sparse vector can be obtained by solving an -optimization problem, 

min ||a;|| 1 s.t. Ax = b . 

The above problem can be easily formulated as a linear program and is hence efficiently solvable. We recall for the 
reader that a matrix A is said to satisfy RIP of order k if there is some 6k £ [0,1) such that, for all x with ||x||o < k, 
we have 

(l - 6k)\\x\\^ < \\Ax\\^ < (l + 6k)\\x\\' . 
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Several random matrix ensembles are known to satisfy Sck < with high probability provided one chooses m = 
O (|| log ^) measurements. [2] showed that £i-minimization recovers all A:-sparse vectors provided A satisfies 52k < 
0.414 although the condition has been recently improved to S2k < 0.473 [11]. Note that, in compressed sensing, the 
goal is to recover all, or most, fc-sparse signals using the same measurement matrix A. Hence, weaker conditions such 
as restricted convexity [20] studied in the statistical literature (where the aim is to recover a single sparse vector from 
noisy linear measurements) typically do not suffice. In fact, if RIP is not satisfied then multiple sparse vectors x can 
lead to the same observation b, hence making recovery of the true sparse vector impossible. 

Based on its RIP guarantees, -minimization can guarantee recovery using just 0{k\og{n/k)) measurements, 
but it has been observed in practice that -minimization is too expensive in large scale applications [8], for example, 
when the dimensionaUty is in the milUons. This has sparked a huge interest in iterative methods for sparse recovery. 
An early classic iterative method is Orthogonal Matching Pursuit (OMP) [21,6] that greedily chooses elements to add 
to the support. It is a natural, easy-to-implement and fast method but unfortunately lacks strong theoretical guarantees. 
Indeed, it is known that, if run for k iterations, OMP cannot uniformly recover all /e-sparse vectors under an RIP 
condition of the form S2k < 6 [22, 18]. However, [26] showed that OMP, if run for 30fc iterations, recovers the 
optimal solution for S^ik < 1/3; a significantly more restrictive condition than the ones required by other methods 
like -minimization. 

Several other iterative approaches have been proposed that include Iterative Soft Thresholding (1ST) [17], Iterative 
Hard Thresholding (IHT) [1], Compressive Sampling Matching Pursuit (CoSaMP) [19], Subspace Pursuit (SP) [4], 
Iterative Thresholding with Inversion (ITI) [16], Hard Thresholding Pursuit (HTP) |10| and many others. Among 
the family of iterative hard thresholding algorithms, following 1 17|, we can identify two major subfamilies: one- and 
two-stage algorithms. As their names suggest, the distinction is based on the number of stages in each iteration of 
the algorithm. One-stage algorithms such as IHT, ITI and HTP, decide on the choice of the next support set and then 
usually solve a least squares problem on the updated support. The one-stage methods always set the support set to 
have size k, where k is the target sparsity level. On the other hand, two-stage algorithms, notable examples being 
CoSaMP and SP, first enlarge the support set, solve a least squares on it, and then reduce the support set back again 
to the desired size. A second least squares problem is then solved on the reduced support. These algorithms typically 
enlarge and reduce the support set by k or 2k elements. An exception is the two-stage algorithm FoBa [25] that adds 
and removes single elements from the support. However, it differs from our proposed methods as its analysis requires 
very restrictive RIP conditions {5^k < 0.1 as quoted in [14]) and the connection to locality sensitive hashing (see 
below) is not made. Another algorithm with replacement steps appears in [23]. However, the algorithm and the setting 
under which it is analyzed are different from ours. 

In this paper, we present, and provide a unified analysis for a family of one-stage iterative hard thresholding 
algorithms. The family is parameterized by a positive integer I < k. At the extreme value I = k, we recover the 
algorithm ITI/HTP. At the other extreme fc = 1, we get a novel algorithm that we call Orthogonal Matching Pursuit 
with Replacement (OMPR). OMPR can be thought of as a simple modification of the classic greedy algorithm OMP: 
instead of simply adding an element to the existing support, it replaces an existing support element with a new one. 
Surprisingly, this change allows us to prove sparse recovery under the condition S2k < 0-499. This is the best S2k 
based RIP condition under which any method, including £i-minimization, can be shown to provably perform sparse 
recovery. 

OMPR also lends itself to a faster implementation using locality sensitive hashing (LSH). This allows us to provide 
recovery guarantees using an algorithm whose run-time is provably sub-linear in n, the number of dimensions. An 
added advantage of OMPR, unlike many iterative methods, is that no careful tuning of the step-size parameter is 
required even under noisy settings or even when RIP does not hold. The default step-size of 1 is always guaranteed to 
converge to at least a local optima. 

Finally, we show that our proof techniques used in the analysis of the OMPR family are useful in tightening the 
analysis of two-stage algorithms, such as CoSaMP and SP, as well. As a result, we are able to prove better recovery 
guarantees for these algorithms — d4k < 0.35 for CoSaMP and < 0.35 for SP. We hope that this unified analysis 
sheds more light on the interrelationships between the various kinds of iterative hard thresholding algorithms. 

In summary, the contributions of this paper are as follows. 

• We present a family of iterative hard thresholding algorithms that on one end of the spectrum includes existing 
algorithms such as ITI/HTP while on the other end gives OMPR. OMPR is an improvement over the classical 
OMP method as it enjoys better theoretical guarantees and is also better practically as shown in our experiments. 

• Unlike other improvements over OMP, such as CoSaMP or SP, OMPR changes only one element of the support 
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at a time. This allows us to use Locality Sensitive Hashing (LSH) to speed it up resulting in the first provably 
sub-linear (in the ambient dimensionality n) time sparse recovery algorithm. 

• We provide a general proof for all the algorithms in our partial hard thresholding based family. In particular, we 
can guarantee recovery using OMPR, under both noiseless and noisy settings, provided 52k < 0.499. This is the 
least restrictive condition under which any efficient sparse recovery method is known to work. Furthermore, 
our proof technique can be used to provide a general theorem that provides the least restrictive known guarantees 
for all the two-stage algorithms such as CoSamp and SP (see Appendix D). 

AH proofs omitted from the main body of the paper can be found in the appendix. 

2 Orthogonal Matching Pursuit with Replacement 

Orthogonal matching pursuit (OMP), is a classic iterative algorithm for sparse recovery. At every stage, it selects a 
coordinate to include in the current support set by maximizing the inner product between columns of the measurement 
matrix A and the current residual h — Ax^ . Once the new coordinate has been added, it solves a least squares problem 
to fully minimize the error on the current support set. As a result, the residual becomes orthogonal to the columns of 
A that correspond to the current support set. Thus, the least squares step is also referred to as orthogonalization by 
some authors [5J. 

Let us briefly explain some of our notation. We use the MATLAB notation: 

A\l} := argmin \\Ax — h\\2 ■ 

X 

The hard thresholding operator Hk{ ) sorts its argument vector in decreasing order (in absolute value) and retains 
only the top k entries. It is defined formally in the next section. Also, we use subscripts to denote sub-vectors and 
submatrices, e.g. if / C [n] is a set of cardinaUty k and x e M", xi e M*^ denotes the sub-vector of x indexed by I. 
Similarly, Aj for a matrix A e M'"'^" denotes a sub-matrix of size mxk with colunms indexed by /. The complement 
of set / is denoted by / and xj denotes the subvector not indexed by /. The support (indices of non-zero entries) of a 
vector X is denoted by supp(x). 

Our new algorithm called Orthogonal Matching Pursuit with Replacement (OMPR ), shown as Algorithm 1 , differs 
from OMP in two respects. First, the selection of the coordinate to include is based not just on the magnitude of entries 
in {b — Ax*") but instead on a weighted combination a;* + rjA'^ {b — Ax^ ) with the step-size tj controlhng the relative 
importance of the two addends. Second, the selected coordinate replaces one of the existing elements in the support, 
namely the one corresponding to the minimum magnitude entry in the weighted combination mentioned above. 

Once the support It+i of the next iterate has been determined, the actual iterate a;*+^ is obtained by solving the 
least squares problem: 

x*~^^ = argmin \\Ax — b\\2 ■ 

X : supp(x)=/t+i 

Note that if the matrix A satisfies RIP of order k or larger, the above problem will be well conditioned and can be 
solved quickly and reliably using an iterative least squares solver. We will show that OMPR, unlike OMP, recovers any 
A;-sparse vector under the RIP based condition 52k < 0.499. This appears to be the least restrictive recovery condition 
(i.e., best known condition) under which any method, be it basis pursuit (^i -minimization) or some iterative algorithm, 
is guaranteed to recover all A;-sparse vectors. 

In the literature on sparse recovery, RIP based conditions of a different order other than 2k are often provided. It is 
seldom possible to directly compare two conditions, say, one based on 52k and the other based on 53k- Foucart [10] has 
given a heuristic to compare such RIP conditions based on the number of samples it takes in the Gaussian ensemble 
to satisfy a given RIP condition. This heuristic says that an RIP condition of the form 5ck < 6is less restrictive if the 
ratio is smaller. For the OMPR condition 52k < 0.499, this ratio is 2/0.499^ w 8 which makes it heuristicaUy 
the least restrictive RIP condition for sparse recovery. 

Theorem 1 (Noiseless Case). Suppose the vector x* G R" is k-sparse and the matrix A satisfies 52k < 0.499 and 
52 < 0.002. Then OMPR recovers e approximation to x* from measurements b — Ax* in 0{k log fc/e) iterations. 
Theorem 2 (Noisy Case). Suppose the vector x* G M" is k-sparse and the matrix A satisfies 52k < -499 and 
52 < 0.002. Then, in 0(klogk/e) iterations OMPR converges to C + e approximate solution, i.e., f{x) = l/2\\A{x — 
a;*) -h e|p < ||e|p/roOT measurements b = Ax* + e. C > Ois a universal constant and is dependent only on 52k- 
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Algorithm 1 OMPR Algorithm 2 OMPR (Q 
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Input: matrix A, vector b, sparsity level k 
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Input: matrix A, vector b, sparsity level k 
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Parameter: step size rj > 
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Parameter: step size t] > 
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for f = 1 to T do 
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for f = 1 to T do 
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jt+i ^ argmaxj.^^^ \z*+'^\ 
Jt+i ^ItU {jt+ij 
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topj_|_i ^ indices of top 1 elements of \z*j^^ 
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7 
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Jt+i ^ It^ top4+i 
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It+i ^ supp(y*+i) 
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Jt+i ^ supp(2/*+i) 
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x\+^ ^ Ai^.,\b, x\+^ ^ 
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end for 


11 


end for 



The above theorems are actually special cases of our convergence results for a family of algorithms that contains 
OMPR as a special case. We now turn our attention to this family. We note that the condition S2 < 0.002 is very mild 
and will typically hold for standard random matrix ensembles as soon as the number of rows sampled is larger than a 
fixed universal constant. 

3 A New Family of Iterative Algorithms 

In this section we show that OMPR is one particular member of a family of algorithms parameterized by a single 
integer I G {1, .... k}. The l-th member of this family, OMPR (I), shown in Algorithm 2, replaces at most I elements 
of the current support with new elements. OMPR corresponds to the choice 1 = 1. Hence, OMPR and OMPR (1) 
refer to the same algorithm. 

Our first result in this section connects the OMPR family to hard thresholding. Given a set I of cardinahty k, 
define the partial hard thresholding operator 

Hk{z;I,l) := argmin (1) 

IIj/IIoS*: 
|supp(y)\/|<i 

As is clear from the definition, the operator tries to find a vector y close to a given vector z under two constraints: (i) 
the vector y should have bounded support i\\y\\o < fc),and (ii) its support should not include more than I new elements 
outside a given support /. 

The name partial hard thresholding operator is justified because of the following reasoning. When / = k, the 
constraint I supp(2/)\/| < / is trivially implied by ||y||o < A: and hence the operator becomes independent of /. In fact, 
it becomes identical to the standard hard thresholding operator 

Hk {z; I, k) = Hk (z) := argmin \\y - z\\ . (2) 

li'llo<fe 

Even though the definition of Hk (z) seems to involve searching through (^') subsets, it can in fact be computed 
efficiently by simply sorting the vector z by decreasing absolute value and retaining the top k entries. 

The following result shows that even the partial hard thresholding operator is easy to compute. In fact, lines 6-8 
in Algorithm 2 precisely compute H^ ',Itj)- 

Proposition 3. Let \I\ = k and z be given. Then y = Hk {z] I, I) can be computed using the sequence of operations 

top = indices of top I elements of \zj\, J = /Utop, y = Hk{zj) . 

The proof of this proposition is straightforward and elementary. However, using it, we can now see that the 
OMPR (Z) algorithm has a simple conceptual structure. In each iteration (with current iterate x* having support 
It = supp(a;*)), we do the following: 
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1. (Gradient Descent) Form 2:*+^ = a;* — riA^{Ax* — b). Note that A^{Ax* — b) is the gradient of the objective 
function ^\\Ax — 6|p at x*. 

2. (Partial Hard Thresholding) Form by partially hard thresholding 2:*+^ using the operator Hk {-lit, I)- 

3. (Least Squares) Form the next iterate a;*+^ by solving a least squares problem on the support It+i of y*~^^. 

A nice property enjoyed by the entire OMPR family is guaranteed sparse recovery under RIP based conditions. 
Note that the condition under which OMPR (0 recovers sparse vectors becomes more restrictive as I increases. This 
could be an artifact of our analysis, as in experiments, we do not see any degradation in recovery abiUty as I is 
increased. 

Theorem 4 (Noiseless Case). Suppose the vector x* G K" is k-sparse. Then OMPR (I) recovers an e approximation to 
X* from measurements b = Ax* in 0{j log(l/e)) iterations provided we choose a step size r] that satisfies ri{l+S2i) < 

1 and 77(1 - 62k) > 1/2. 

Theorem 5 (Noisy Case). Suppose the vector x* € M" is k-sparse. Then OMPR (I) converges to aC + e-approximate 
solution, i.e., f{x) = l/2\\Ax — < ^^WeW^ from measurements b = Ax* + e in 0{j\og{{k + ||e||^)/e)) 

iterations provided we choose a step size r] that satisfies r]{l + ^2;) < 1 and 52k < 1 — 25^' where D = ^^^2 - 

Proof. Here we provide a rough sketch of the proof of Theorem 4; the complete proof is given in Appendix A. 

Our proof uses the following crucial observation regarding the structure of the vector = .t* — rjA^ ( Ax*- — b) . 
Due to the least squares step of the previous iteration, the current residual Ax* — 6 is orthogonal to columns of Aj^ . 
This means that 



-r^A^ {Ax* - b) . (3) 



As the algorithm proceeds, elements come in and move out of the current set It. Let us give names to the set of found 
and lost elements as we move from It to It+i : 

(found): Ft = It+M, (lost) : = /A/t+i- 

Hence, using (3) and updates for yt+i- = z*^^ = —r]A'^^A{x* — x*), and z*]^^ = x*i^. Now let f{x) = 

l/2ma; — 6||^, then using M/jper RIP and the fact that I supp(t/*+^ — a;*)| = \FtULt\ < 2Z, we can show that (details 
are in the Appendix A): 

,' + 1^ _ ^ ( ^+^-21 _ 1\ IL/+1||2 _L l±^ll^« l|2 



Furthermore, since 2/*+^ is chosen based on the k largest entries in z*j^^^, we have: = H-Zj;^"^!!^ > H-^Lt^lP = 

II^Ltll^ • Plugging this into (4), we get: 

- fix') < f 1 + - W'f . (5) 



The above expression shows that if 77 < then our method monotonically decreases the objective function 

and converges to a local optimum even if RIP is not satisfied (note that upper RIP bound is independent of lower RIP 
bound, and can always be satisfied by normalizing the matrix appropriately). 

However, to prove convergence to the global optimum, we need to show that at least one new element is added at 
each step, i.e., \Ft\ > 1. Furthermore, we need to show sufficient decrease, i.e, > c^f{x*). We show both 

these conditions for global convergence in Lemma 6, whose proof is given in Appendix A. 

Assuming Leimna 6, (5) shows that at each iteration OMPR (/) reduces the objective function value by at least a 
constant fraction. Furthermore, if a;*^ is chosen to have entries bounded by 1, then /(a;°) < (1 + 52k)k. Hence, after 
0{k/l log(fc/e)) iterations, the optimal solution x* would be obtained within e error. □ 

Lemma 6. Let 52k < 1 ~ 5^ '^^d 1/2 < 77 < 1. Then assuming /(x*) > 0, at least one new element is found i.e. 
Ft ^ 0. Furthermore, > ^cf{x*), where c = min(477(l — ry)^, 2(277 ~ > w a constant. 
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Special Cases: We have already observed that the OMPR algorithm of the previous section is simply OMPR (1). 
Also note that Theorem 1 immediately follows from Theorem 4. 

The algorithm at the other extreme of I = k has appeared at least three times in the recent literature: as Iterative 
(hard) Thresholding with Inversion (ITI) in [16], as S VP-Newton (in its matrix avatar) in [15], and as Hard Thresh- 
olding Pursuit (HTP) in [10]). Let us call it IHT-Newton as the least squares step can be viewed as a Newton step for 
the quadratic objective. The above general result for the OMPR family immediately implies that it recovers sparse 
vectors as soon as the measurement matrix A satisfies < 1/3. 

Corollary 7. Suppose the vector x* S K" is k-sparse and the matrix A satisfies 62k < 1/3. Then IHT-Newton 
recovers x* from measurements b = Ax* in 0(log(fc)) iterations. 

4 Tighter Analysis of Two Stage Hard Thresholding Algorithms 

Recently, [17] proposed a novel family of algorithms, namely two-stage hard thresholding algorithms. During each 
iteration, these algorithms add a fixed number (say I) of elements to the current iterate's support set. A least squares 
problem is solved over the larger support set and then I elements with smallest magnitude are dropped to form next 
iterate's support set. Next iterate is then obtained by again solving the least squares over next iterate's support set. See 
Appendix D for a more detailed description of the algorithm. 

Using proof techniques developed for our proof of Theorem 4, we can obtain a simple proof for the entire spectrum 
of algorithms in the two-stage hard thresholding family. 

Theorem 8. Suppose the vector x* S {—1, 1}" is k-sparse. Then the Two-stage Hard Thresholding algorithm with 
replacement size I recovers x* from measurements b = Ax* in 0{k) iterations provided: 62k+i < -35. 

Note that CoSaMP [19] and Subspace Pursuit(SP) [4] are popular special cases of the two-stage family. Using our 
general analysis, we are able to provide significantly less restrictive RIP conditions for recovery. 
Corollary 9. CoSaMP[19] recovers k-sparse x* G {—1, l}^from measurements b = Ax* provided < 0.35. 
Corollary 10. Subspace Pursuit[4] recovers k-sparse x* G {—1, 1}"' from measurements b = Ax* provided 63k < 
0.35. 

Note that CoSaMP's analysis given by [19] requires S4k < 0.1 while Subspace Pursuit's analysis given by [4] 
requires (^sfe < 0.205. See Appendix D in the supplementary material for proofs of the above theorem and corollaries. 



5 Fast Implementation Using Hashing 

In this section, we discuss a fast implementation of the OMPR method using locality-sensitive hashing. The main intu- 
ition behind our approach is that the OMPR method selects at most one element at each step (given by argmax^ | Aj (Ax' — 
&)|); hence, selection of the top most element is equivalent to finding the colunm Ai that is most "similar" (in magni- 
tude) to rt = Ax* — b, i.e., this may be viewed as the similarity search task for queries of the form rt and — r*. 

To this end, we use locality sensitive hashing (LSH) [12], a well known data- structure for approximate nearest- 
neighbor retrieval. Note that while LSH is designed for nearest neighbor search (in terms of Euclidean distances) and 
in general might not have any guarantees for the similar neighbor search task, we are still able to apply it to our task 
because we can lower-bound the similarity of the most similar neighbor. 

We first briefly describe the LSH scheme that we use. LSH generates hash bits for a vector using randomized hash 
functions that have the property that the probability of collision between two vectors is proportional to the similarity 
between them. For our problem, we use the following hash function: hu{x) = sign(u^a;), where u ^ N{0,I) is a 
random hyper-plane generated from the standard multivariate Gaussian distribution. It can be shown that [13] 

Pr[K{x,) = K{X2)] = 1 - - cos-i ( .. "^[f . 

Now, an s-bit hash key is created by randomly sampUng hash functions i.e., g{x) = [hu^ {x) , hu^ (cc) , . . . , (x)], 
where each Ui is sampled randomly from the standard multivariate Gaussian distribution. Next, q hash tables are 
constructed during the pre-processing stage using independently constructed hash key functions gi , 92 , ■ • ■ . 9q- During 
the query stage, a query is indexed into each hash table using hash-key functions gi, g2, ■ ■ ■ , gg and then the nearest 
neighbors are retrieved by doing an exhaustive search over the indexed elements. 

Below we state the following theorem from [ 1 2] that guarantees sub-linear time nearest neighbor retrieval for LSH. 
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(a) OMPR (b) OMP (c) IHT-Newton 

Figure 1: Phase Transition Diagrams for different methods. Red represents high probability of success while blue 
represents low probability of success. Clearly, OMPR recovers correct solution for a much larger region of the plot 
than OMP and is comparable to IHT-Newton. (Best viewed in color) 

Theorem 11. Let s — 0(log n) and q — 0(log 1 /6)n~ , then with probability \ — 5, LSH recovers (1 + e)-nearest 
neighbors, i.e., \\x' — r|p < (1 + e)||a;* — r|p, where x* is the nearest neighbor to r and x' is a point retrieved by 
LSH. 

However, we cannot directly use the above theorem to guarantee convergence of our hashing based OMPR algo- 
rithm as our algorithm requires finding the most similar point in terms of magnitude of the inner product. Below, we 
provide appropriate settings of the LSH parameters to guarantee sub-linear time convergence of our method under a 
slightly weaker condition on the RIP constant. A detailed proof of the theorem below can be found in Appendix B. 

Theorem 12. Let 52k < 1/4 — 7 '^^^ rj — 1 — j, where 7 > is a small constant, then with probability 1 — 6, OMPR 
with hashing converges to the optimal solution in 0{km,n^/ (i+f2(i/fc)) ^ g'j computational steps. 

The above theorem shows that the time complexity is sub-linear in n. However, currently our guarantees are not 
particularly strong as for large k the exponent of n will be close to 1. We believe that the exponent can be improved 
by more careful analysis and our empirical results indicate that LSH does speed up the OMPR method significantly. 

6 Experimental Results 

In this section we present empirical results to demonstrate accurate and fast recovery by our OMPR method. In the 
first set of experiments, we present phase transition diagram for OMPR and compare it to the phase transition diagram 
of OMP and IHT-Newton with step size 1. For the second set of experiments, we demonstrate robustness of OMPR 
compared to many existing methods when measurements are noisy or smaller in number than what is required for exact 
recovery. For the third set of experiments, we demonstrate efficiency of our LSH based implementation by comparing 
recovery error and time required for our method with OMP and IHT-Newton (with step-size 1 and 1/2). We do not 
present results for the £i/basis pursuit methods, as it has already been shown in several recent papers [10, 17] that the 
£1 relaxation based methods are relatively inefficient for very large scale recovery problems. 

In all the experiments we generate the measurement matrix by sampling each entry independently from the standard 
normal distribution J\f{0, 1) and then normalize each column to have unit norm. The underlying fc-sparse vectors are 
generated by randomly selecting a support set of size k and then each entry in the support set is sampled uniformly from 
{+1,-1}. We use our own optimized implementation of OMP and IHT-Newton. All the methods are implemented in 
MATLAB and our hashing routine uses mex files. 

6.1 Phase Transition Diagrams 

We first compare different methods using phase transition diagrams which are commonly used in compressed sensing 
literature to compare different methods [17]. We first fix the number of measurements to be m = 400 and generate 
different problem sizes by varying p — k/m and 6 = m/n. For each problem size [m, n, k), we generate random 
m X n Gaussian measurement matrices and fc-sparse random vectors. We then estimate the probability of success of 
each of the method by applying the method to 100 randomly generated instances. A method is considered successful 
for a particular instance if it recovers the underlying fc-sparse vector with at most 1% relative error. 
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1.48(0.3) 


1.24(0.6) 



(c) 



Figure 2: Error in recovery (||^a; — of n = 3000 dimensional vectors from m = 200 measurements, (a): Error 
incurred by various methods as the sparsity level k increases. Note that OMPR incurs the least error as it provably 
converges to at least a local minima for fixed step size i] = 1. (b): Error incurred by various methods as the noise level 
increases. Here again OMPR performs significantly better than the existing methods, (c); Difference in error incurred 
by IHT-Newton and OMPR , i.e., Error(IHT-Newton)-Error(OMPR ). Numbers in bracket denote confidence interval 
at 95% significance level. 



In Figure 1 , we show the phase transition diagram of our OMPR method as well as that of OMP and IHT-Newton 
(with step size 1). The plots shows probability of successful recovery as a function of p — m/n and S — k/ra. 
Figure 1 (a) shows color coding of different success probabilities; red represents high probability of success while 
blue represents low probability of success. Note that for Gaussian measurement matrices, the RIP constant 62k is 
less than a fixed constant if and only if to = Cfclog(n/fc), where C is a universal constant. This implies that 
\ = C log p and hence a method that recovers for high 62k will have a large fraction in the phase transition diagram 
where successful recovery probability is high. We observe this phenomenon for both OMPR and IHT-Newton method 
which is consistent with their respective theoretical guarantees (see Theorem 4). On the other hand, as expected, the 
phase transition diagram of OMP has a negligible fraction of the plot that shows high recovery probability. 

6.2 Performance for Noisy or Under-sampled Observations 

Next, we empirically compare performance of OMPR to various existing compressed sensing methods. As shown 
in the phase transition diagrams in Figure 1, OMPR provides comparable recovery to the IHT-Newton method for 
noiseless cases. Here, we show that OMPR is fairly robust under the noisy setting as well as in the case of under- 
sampled observations, where the number of observations is much smaller than what is required for exact recovery. 

For this experiment, we generate random Gaussian measurement matrix of size m = 200, n ~ 3000. We then 
generate random binary vector x of sparsity k and add Gaussian noise to it. Figure 2 (a) shows recovery error {\\Ax — 
h\\) incurred by various methods for increasing k and noise level of 10%. Clearly, our method outperforms the existing 
methods, perhaps a consequence of guaranteed convergence to a local minima for fixed step size 77 = 1. Similarly, 
Figure 2 (b) shows recovery error incurred by various methods for fixed fc = 50 and varying noise level. Here again, 
our method outperforms existing methods and is more robust to noise. Finally, in Figure 2 (c) we show difference in 
error incurred along with confidence interval (at 95% signficance level) by IHT-Newton and OMPR for varying levels 
of noises and k. Our method is better than IHT-Newton (at 95% signficance level) in terms of recovery error in around 
30 cells of the table, and is not worse in any of the cells but one. 

6.3 Performance of LSH based implementation 

Next, we empirically study recovery properties of our LSH based implementation of OMPR ( OMPR-Hash ) in the 
following real-time setup: Generate a random measurement matrix from the Gaussian ensemble and construct hash 
tables offline using hash functions specified in Section 5. Next, during the reconstruction stage, measurements arrive 
one at a time and the goal is to recover the underlying signal accurately in real-time. For our experiments, we generate 
measurements using random sparse vectors and then report recovery error \\Ax — h\\ and computational time required 
by each of the method averaged over 20 runs. 

In our first set of experiments, we empirically study the performance of different methods as k increases. Here, we 
fix m = 500, n = 500, 000 and generate measurements using n-dimensional random vectors of support set sizeTO/10. 
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k n(xlOOO) n(xlOOO) 



(a) (b) (c) 

Figure 3: (a): Error — b\\) incurred by various methods as k increases. The measurements b = Ax are computing 
by generating x with support size m/10. (b),(c): Error incurred and time required by various methods to recover 
vectors of support size 0. Im as n increases. IHT-Newton(l/2) refers to the IHT-Newton method with step size rj = 1/2. 



We then run different methods to estimate vectors x of support size k that minimize \\Ax — b\\. For our OMPR-Hash 
method, we use s — 20 bits hash-keys and generate q = ^Jn hash-tables. Figure 3 (a) shows the error incurred by 
OMPR , OMPR-Hash , and IHT-Newton for different fc (recall that k is an input to both OMPR and IHT-Newton). 
Note that although OMPR-Hash performs an approximation at each step, it is still able to achieve error similar to 
OMPR and IHT-Newton. Also, note that since the number of measurements are not enough for exact recovery by the 
IHT-Newton method, it typically diverges after a few steps. As a result, we use IHT-Newton with step size ?7 = 1/2 
which is always guaranteed to monotonically converge to at least a local minima (see Theorem 4). In contrast, in 
OMPR and OMPR-Hash can always set step size r\ aggressively to be 1. 

Next, we evaluate OMPR-Hash as dimensionality of the data n increases. For OMPR-Hash , we use s = log2(n) 
hash-keys and q = ^Jn hash-tables. Figures 3(b) and (c) compare error incurred and time required by OMPR-Hash 
with OMPR and IHT-Newton. Here again we use step size 77 = 1/2 for IHT-Newton as it does not converge for 77 = 1. 
Note that OMPR-Hash is an order of magnitude faster than OMPR while incurring slightly higher error OMPR-Hash 
is also nearly 2 times faster than IHT-Newton. 
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A Proofs related to OMPR: Exact Recovery Case 

Let us denote the objective function by f{x) = | \\Ax — Let It denote the support set of x* and /* be the support 
set of X*. Define the sets 

FAt = It\I* (false alarms) 

MDt = I*\It (missed detections) 

COt = It n J* (correct detections) . 

As the algorithms proceed, elements get in and move in and out of the current set It. Let us give names to the set of 
found and lost elements as we move from It to It+i- 

Ft = It+M (found) 
Ll = Il\Il+i (lost) . 

We first state two technical lemmas that we will need. These can be found in [19]. 
Lemma 13. For any S C [n], we have, 

\\I - A^AsW < S^si- 
Lemma 14. For any S,T c [n] such that S r\T = %,we have, 



Proof of Theorem 3 

Lemma 15. Let (52fe < 1 - ^. Then, in OMPR (I), 

< 2(2ry- < \\/+},f - \\x'^^J\\ 

Proof. Since x*j^ is the solution to the least squares problem mina; \\Ai^x — &||^, 

Al{Aj,x%-b) = 0. (6) 



Now, note that 



f{x') = ^\\Aj,xi-Aj.x*.f, 

= liixifAliAjy - Aj.x*) - {x*j.fAj4Ajy - Aj.x*)), 
= -lixMDyAlfj,^{Aiyj^-Aj.x*j.), by (6) 



= 7r:{x*MDyz'+l. by (3) (7) 



Hence, 



2rj 



= \\xMDM' + \\zlihf-^vf{x'). (8) 
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That is, 

^ I^MDj 11^ + IkiTAt 11^ + IkcOt ~ Xcot 11^ ~ I^FAt 11^ + II^MDj 11^' 



ll-r* - 'r*l|2 -I- 11^ _ IP 

||X X II T ||^;\^£) II ll-'-FAtll ) 



1 

'2k ' 



2 



where the third line follows from the fact that MDt, FAt, and COt are disjoint sets. 
As /(a;*) > and S2k < ^ ~ 'i^'^^ above inequality implies 



< 2(2,?- — < i|z^i,J|2 - llx^^JI^ 



□ 



(9) 



Next, we provide a lemma that bounds the function value / (a;*) in terms of missed detections MDt and also z*j^^^ . 
Lemma 16. Let /(a;*) = 5 1| Aa;* — b = Ax*, S^k < ^ — and r] < 1. Then, at each step, 

^^IkWf < < ^;^^2\\^'^kr 

Proof. Now, using Lemma 2 of [4] with I = MDt, J = luV = ^MDtX*Mjj^ we get 

f{x') = \\\Ax*-hr 

= IWMix* - x*)i, - AmdAdJ^ (10) 



S2k ^ ^ 



1-5 



>i(l--^) PMD,a;J^,^,f 



2 



^^l-T^) (l-'^'fe)II^Wf by RIP 



> 



(1-2^2-^2 



The assumption that (52fe < l-^^and?? < 1 implies that 52fc < 1-^ < 1/2. The function a i-^- (l-2a)2/(2(l-a)) 
is decreasing on [0, 1 /2] and hence (11) implies 



^(^*) ^ 2(1- l + f) = '-^\\x*mdA'- (12) 



Next, using (7) and Cauchy-Schwarz inequahty: 



l|2 



fix 



.t\2 



,J|^>V^^^. (13) 



"XMDt 



The result now follows using the above equation with (12). □ 
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Lemma 17. Let 62k < 1 ~ 5^ '^^^ 1/2 < i] < 1. Then assuming /(a;*) > 0, at least one new element is found i.e. 
Ft ^ 0. Furthermore, > |c/(x*), where c = min(4r7(l - jj)^, 2(27/ - j^s^)) > Oisa constant. 

Proof. We consider the following three exhaustive cases: 

1. \Ft\ < I and \Ft\ < \MDt\: Let S C \MDt\Ft\, s.t, l^] = \Ft\ - \MDt n Ft\. Now, 

\SU{MDtnFt)\ = \Ft\, \{MDt\Ft)\S\ = \MDt\ - \Ft\. 

Now, as consists of top Ft elements of z^^^y^: 

ll4uWnF.)f <ll2/^*f- (14) 

Furthermore, since \Ft\ < I, hence every element of ZMDt\Ft smaller in magnitude than every element of 
•"^FAiXLt ' Otherwise that element should have been included in Ft . Furthermore, | MDt | — | -Ft | = | FAt | — | | < 
\FAt\Lt\. Hence, 

\\^{MDt\Ft)\s\\ - \\^FAt\Lt\\ ^ I^FaJI ) (15) 

Adding (14) and (15), we get: 

\\zXDj'<\WFt'r + \WFAj\'- (16) 

Using above equation along with Lemma 15, we get: 

,,t+l||2 ~ ( - 1 



Now, note that if =0, then y*p^ = implying that /(a;*) = 0. Hence, at least one new element is added, 
i.e., y'+' 0. 



2. \Ft\=l< \MDt\: By definition of y^"^: 



\Ft\ - \MDt\ 
Hence, using Lemma 16 and the fact that \Ft \ = I: 

- pZD^^"^^ " - i^'^^^ ~ ^^^^ 

as \MDt\ < k. 

3. \Ft\ > \MDt\: Since, 2/^+^ is the top most elements of z*+^. Hence, assuming \Ft\ > \MDt\, 
Now, using Lenmia 16: 

||y^!'||'> 477(1 (19) 
We get the lemma by combining bounds for all the three cases, i.e., (17), (18), (19). □ 
Now we give a complete proof of Theorem 4. 
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Proof. We have, 

- fix') = - xYA-^Aix' - X*) + l/2p(y*+i - x')f, 

< - xTA^Aix' - x*) + ii±M(||y^+i||2 + ||^*^j|2). (20) 

where the second inequahty follows by using the fact that y*i^\r\i^ = x\^ ^^^^ and using RIP of order 21 (since 

|supp(y*+i-x*)| = |i^tULt|<20. 

Since x*j^ is obtained using least squares, 

A]^A{x' - X*) = 0. 
Thus, Al^A{x* -X*) = 0, because Lt C I^. Next, note that 

y'+' = -vAlA{x* - X*). 

Hence, 



/(y*+^)-/(a^*)< (^^-^)lbrir + ^^lKf. (21) 



Furthermore, since y + is chosen based on the k largest entries in z we have, 

Ib^f = 114^11' >ll4t^f = Kir- 

Plugging this into (21), we get: 

f{y'^') - fix') < {1+621-'-) iiy^nr 

Now, using Lemma 17, Hj/^^P > jcf{x^) and therefore, 

/(a;*+i) - fix') < fiy*+') - fix') < -a^fi^') 



where a = c (l + 821 — > since 77(1 + S21) < 1. Hence, 



fix'+')<il-a-)fix*)<e-"-^fix*). 

The above inequality shows that at each iteration OMPR (0 reduces the objective function value by a fixed multi- 
plicative factor. Furthermore, if is chosen to have entries bounded by 1, then /(a;°) < (1 + S2k)k. Hence, after 
Oij log((l + S2k)k/e)) iterations, the function value reduces to e, i.e., /(x*) < e. □ 

B Proofs related to the LSH Section 

Lemma 18. Let \\x\\ = Ifor all points x in our database. Let x* be the nearest neighbor to r in L2 distance metric, 
and let r'^x* > c > 0. Then, a (1 + ae)-nearest neighbor to r is also a {1 — e)-similar neighbor to r, where 

a< 2c 



l+rTr-2c- 

Proof. Let a;' be a (1 + ae)-nearest neighbor to r, then: 

Ik'-rf < (l + ae)||x* - rll 
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Using 1 1 a;' 1 1 = ||a;*|| = 1 and simplifying, we get: 

„T „i \ ^^ ,\„T„* 



P'x' > (1 - e)P'x* + (a + \)er'^x* - yCl + r'^r), 
> (1 - e)r^x* + {{a + l)c - |(1 + r^r))e. 

Hence, a;' is a (1 — e)-approximate similar neighbor to r if: 

(a + l)c > ^(1 +r^r). 

The result follows after simplification. □ 

We now provide a proof of Theorem 7. 

Proof. Let us first consider a single step of OMPR . Now, similar to Lemma 15, we can show that if (52fe < 1/4 — 7and 

= 1 - 7, 7 > 0, then H^mdJP > III-'^faJP- Setting e = 1 - y^, implies that (1 - e) maxjz^J^J > min |xp,^J, 

i.e., a (1 — e)-similar neighbor to max \z^MDt I ^^^^ ^^^^ ^^^^ ^ constant decrease in the objective function. 

So, the goal is to ensure that with probabihty 1 — 5, ^ > 0, for all the 0{k) iterations, our LSH method returns at 

least a (1 — e)-similar neighbor to max l-z^^^ | where e = 1 — To this end, we need to ensure that at each step t, 

LSH finds at least a (1 — e)-similar neighbor to max \z^MDt I ^i^h probability at least 1 — 5/k. Using Lemma 18, we 

need to find a (1 + ae)-nearest neighbor to max l-z^At I' where 

< 2c 

l + r^r-2c' 

and r'^x* > c. Using Lemma 17, a = 0{l/k). Hence the result now follows using Theorem 6 (main text). □ 

C Extension to Noisy Case 

In this section, we consider the noisy case in which our objective function is f{x) = | \\Ax — 6|| ^, where b = Ax* + e 
and e G M™ is the "noise" vector. 

Let It denote the support set of x* and /* be the support set of x*. Define the sets 

FAt = It\I* (false alarms) 

MDt = I*\It (missed detections) 

COt = ItDl* (correct detections) . 

Lemma 19. Let /(a;*) > ^\\ef and 62k < 1 - 25;^, where D = Then, 

-||4aJI'>c/(x*), 

where c = 2 ^^+^^' (27?£' - > 0. 

Proof. Since x*j^ is the solution to the least squares problem minj^ \\Aj^x — 

Al{Ai,x% - b) = 0. (22) 
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Now, note that 

1 



fi^') = ^\\AiA,-b\ 



2 



liixifAliAj.x' - 6) - b^{Ajy - b)), 
-^b'^{Aj,xl-b), 

^{^Mofz'^k - \^^^^U< - b), (23) 



where the third equaUty follows from (22). 
Now, 



I^MDt ~ -^M^Dtll^ — II^MdJI^ + II^M£)tll^ ~ "^i^^t DtY" ^*MDt 



So, 



IkWf + W^Mof - Mf{x') + \e^{AiA. - b)) (24) 



< - x*r - ^ ii^m j2 _ 4^(_^^^t^ ^ ie^(A4^ - 6)), 



1 - 

Now, by assumption: /(a;*) > §||e|p. Hence, 

\\A{x* - x*)\\ < p(x* - X*) - e\\ + ||e||, 
<2(l + -^)V(x*). 

Hence, 

2 (^2r;(l " ^) " 3^(1 + ^)') /(^*) < H^mAJI' " H^faJP 

Now, by assumption (52fc < 1 - 55;?' where D = Hence, c = 2 ^^+^^' (27?J) - > 0. □ 

Next, we provide a lemma that bounds the function value /(x*) in terms of missed detection MDt and also z^J^^^ . 

Lemma 20. Let f{x^) = ^WAx*- - 5|p > f ||e|p, b = Ax* + e, 62k < - """^ ^ = (%+^)^ ' ^"'^'^ 
step, 
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Proof. First we lower bound / (a;* ) ■ 



,/J(^) = l=\\Ax' - Ax* - e\\, 

>l={\\Ax'-Ax*\\-\\e\\), 
>l^l^^jnmJ\Ax-Ax*\\-\\e\ 

^ 1 ( |, . ||_||,| 



where last equality follows from Lemma 16. Using the above inequahty with /(x ) > ^ ||e|| , we get: 

The assumption that (52fc < 1 — ^i^d < 1 implies that 52k < 1 — 2^ < 1/2. The function a 1-^ {1 — 
2a)^/(2(l — a)) is decreasing on [0, 1/2] and hence the above equation implies 

/(x*)>^i^^||xUJI^. (27) 
Now, we upper bound f{x*). Using definition of /(a;*): 

Now, using Cauchy-Schwarz and /(a;*) > ^||e|p, 

\e^{Ai,xi - 6)1 < \\e\\\\Ai,xi - b\\ < ^/(^*)- 



Hence, 
That is. 



2 fr^t\2 (,rr< 1^2 



||.K.f > 4,M I - ^) ^ > 4,(1 - Unf^^^^n^'). (28) 



where the second inequality follows from (27). □ 

Next, we present the following lemma that shows "enough" progress at each step: 
Lemma 21. Let /(a;*) > ^||e||2, < 1 and S2k < 1 ^ ihrj' ^^^''^ D = 1 — ^ ^ . Then at least one new element 

is found i.e. Ft ^ 0. Furthermore, > {o.f{x^), where a = min(4r;(l - Drif ^^~k^^ , 2 (^+^)' {2r]D - 

1 \ )) > Ois a constant. 

Proof. As for the exact case, we analyse the following three exhaustive cases: 

1. I Ft I < I and |-Ft| < \MDt\: Here we use the exactly similar argument as the exact case to obtain the following 
inequality (see (16)): 

Iki^DjI' <ll2/Frf + KaJI'- (29) 

Using Lemma 19, we get: 

Ib^tl' > cf{x% (30) 

where c is as defined in Lemma 19. Now, note that if \Ft\ = 0, then y^^^ = implying that /(a;*) = 0. Hence, 
at least one new element is added, i.e., y*^^ =^ 0. 
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2. \Ft\=l< \MDt\: By definition of 



\Ft\ - \MDt\ 
Hence, using Lemma 20 and the fact that \Ft \ = I: 



as \MDt\ < k. 

3. \Ft\ > \MDt\: Since, 2/^+^ is the top most elements of Hence, assuming \Ft\ > \MDt\, 

\\yV;r>\\z',^hf- 

Now, using Lennma 20: 

Ib^ti' > 4^(1 - Dr,r^^^^^fix% (32) 

We get the lemma by combining bounds for all the three cases, i.e., (30), (31), (32). □ 

Now, we provide a proof of Theorem 2. 
Proof. We have, 

- fix') = - xYA'^iAx* -b) + l/2||A(y*+i - 



< - xYA^iAx' -b) + Il±M(||y^+i||2 + ||^t^j|2). (33) 



where the second inequality follows by using the fact that 2//^^^/^ = a;/^^^^^^ and using RIP of order 21 (since 

I supp(y*+i - x*)\ = \Ft U Lt\ < 21). 

Since x\^ is obtained using least squares, 

AliAx'-b) = 0. 
That is, {Ax' - b) = 0, because Lf C h- Next, note that 

y'+' = -r]A'^p^{Ax'-b). 

Hence, 

/(2/'^^)-/(-*)< (^^-^)l|y^rf + ^^Kf. (34) 

Furthermore, since j/*"*"^ is chosen based on largest entries in , we have, 

Ib^rf = ll4rf >ll4rf = Kf . 

Plugging this into (34), we get: 

/(y*+^)-/(a.*)<(i + ^2.-^) m'f. 
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Algorithm 3 Two-stage(0 



1: Input: matrix A, vector b, sparsity level k 

2: Initialize 

3: for t = 1 to T do 

4: topj_|_i ^ indices of top I elements of | {Ax* — b) \ 
5: Jt+i <(- /t U topt_,_i 

6: 4+^ ^Aj^.,\b, 

•'t+i '+1 ^ ' Jt+i 

8: /t+i ^ supp(y*+i) 

9: a;*/i <(-^/,,A6, x*+i ^0 

Jt+l -"*+! \ ' /t+1 

10: end for 



Now, using Lemma 21, Hy^^lP > cxf{x*) > and therefore, 

- fix*) < f{y*+') - fix*) 

where c' = ^^li^^^l) ^ a > since t?(1 + 821) < 1. The above inequaUty shows that at each iteration OMPR (0 
reduces the objective function value by a fixed multiplicative factor Furthermore, if x'^ is chosen to have entries 
bounded by 1, then f{x^) < 0((1 + S2k)k + ||e||2). Hence, after 0(f log((A; + ||e|p)/e)) iterations, the function 
value reduces to C||e|| V2 + e. □ 

D Analysis of 2-stage Algorithms 

In this section, we consider the family of two-stage hard thresholding algorithms (see Algorithm 3) introduced by [ 1 7] . 

We now provide a simple analysis for the general two-stage hard thresholding algorithms. We first present a few 
technical lemmas that we will need for our proof. 

Lemma 22. Let b = Ax*, where I* = supp(a;*). Also, let x = argmirig^ (^wj H^a; — 6||^. Then, 



J\\ix-x*)jni4' + \\^i\i4' = - < -r 
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Proof. A similar inequaUty appears in [10] and we rewrite the proof here. Since x/ is the solution to min„ — 6||2, 



AJiAixi -b) = 0. 



In the exact case, b = Ax*. Hence, 



Now, using (35): 



x-x*)Tf = [ix-x*)i oy 
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0=[ix-x*)i ofAlAc 
where G = [I /*\J]. Subtracting (37) from (36) we get, 

Wix - x*)if = [{x - x*)i of (7 - A^Ag) 

< 524ix - x'')i\\^\\ix-x*)i\\^ + \\x*,^,\\ 
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where the second inequality follows using Lemma 13. Lemma follows by just rearranging terms now. □ 

We now present our main theroem and its proof for two-stage thresholding algorithms. 
Theorem 23. Suppose the vector x* e M" is k-sparse and binary. Then Two-stage(l) recovers x* from measurements 
b = Ax* in 0{k) iterations provided: 

hk+i < -35 

Proof. As is the least squares solution over support set Jt+i, hence: 

f{z*+') - fix') < - f{x% (39) 

where = i^' " vA^{Ax* - b))j,^„ t? = ^ and 4+^\ = 0. 
Now, 

/(s*+i) - fix') = (s*+i - xYA^iAx' -b) + ips*+^ - Ax^f. (40) 

Now, as x* is the least squares solution over It. Hence, Aj^ (Ax* — 6) = 0. Hence, 

(a*+i - x')i, = 0, - ccOtop,^, = -VAI^^^^ iAx' - b), (5*+^ - x')j^^^ = 0. (41) 

Using (40) and (41): 

- fix') = -r,\\A[,^^JAx' - b)r + !^||Aop,,,A';p,+,(^^' - b)f, 
< -vWAl^^JAx' - b)f + !l-^^\\Al^^JAx' - 6)||^ 
= -l\\Al^^JAx'-b)f. (42) 
Now, let MDt be the set of missed detections, i.e., MDt = /*\/*. Then, by definition of top(_|_i: 

WA^o^^JAx* - b)f > min (l, - 6)11^. (43) 

Furthermore, 



W^IidM^* -b)\\ = \\AIidAmd,x*mo^ - AIjo^AiJx' - x*)i,\\, 

> \\AjfD,AMD,X*MDj - \\Al,a^Ai^ix' - x*)i,\\, 

> (1 - Sk)\\x*MnA - f'\, \\x*mdM (44) 

where last inequaUty follows using Lemma 14 and Lemma 22. 
Hence, using (42), (43), and (44): 

/(.-V/M</(.-V/(«')<-5(J^».in(l,pg^j^)(l-**-^^ IK„„.f. (45) 
Next, we upper bound increase in the objective function by removing I elements from 2;*+^. 

- fiz'+') = (2/*+^ - -b) + \\\Ay'+' - Az'+'\\\ 

= l\\Ay'+^-Az'+Y, 
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where the second equation follows as is a least squares solution, and both y*"*"^, z*~^^'s support is a subset of 
Jt+i- The third equation follows from RIP and the fact that = y*i^^^ ■ 
Now, using Lemma 22: 

l!4tVll'^T%^ll-^u.JI'- (47) 

Furthermore, | Jt+i\/t-|-i| = I < \ Jt+i\I*\. Hence, by definition of It+i, 

||2 ^ ^ ||2 



Using above equation and (47), we get: 

||2 ^ ^ "2fe+; II * ||2 C^ox 

ll^j.+A/*+i II ^ I j,+,\7*| 1 _ Sl,^, ll'^^*\-^'+i II ' (48) 

Also, |Jf+i\/*| = l+\r\Jt+i\ < I + \MDt\. Using (46), (48), and the fact that /(x'+i) < f(y^+^) and each 
x*j, = 1: 



1 + Si S^2k^ 

l + \I*\Jt+i\ 2 1-61,^1 



Adding (45) and (49), we get: 



'(50) 

Now, UP$^ < min(«, < min(Z, |MA|). 

Hence, 

Now consider: 

A)' " ^^^^) - - - (' + 

> 0.01, (52) 

where the second inequality follows by substituting S2k+i < -35. 
Hence, using (51) and (52), we have: 

/(a;*+^) < fix*) - mm{l, \MDt\) ■ 0.0001. (53) 

The above equation guarantees convergence to the optima in at least 0{k) steps although faster convergence can be 
shown for larger k. □ 

Corollary 24. Cosamp converges to the optima provided 

Sik < 0.35. 

Corollary 25. Subspace-Pursuit converges to the optima provided 

53k < 0.35. 

Note that CoSamp's analysis given by [19] requires 64k < 0.1 while Subspace pursuit's analysis given by [4] 
requires 63k < 0.205. Note that our generic analysis provides significantly better guarantees for both the methods. 



22 



